Paper 1323-2017: Real AdaBoost: Boosting for Credit Scorecards and Similarity to WOE Logistic Regression

نویسندگان

  • Paul K. Edwards
  • Dina Duhon
  • Suhail Shergill
چکیده

Adaboost is a machine learning algorithm that builds a series of small decision trees, adapting each tree to predict difficult cases missed by the previous trees and combining all trees into a single model. We will discuss the AdaBoost methodology and introduce the extension called Real AdaBoost. Real AdaBoost comes from a strong academic pedigree: its authors are pioneers of machine learning and the method has well-established empirical and theoretical support spanning 15 years. Practically speaking, Real AdaBoost is able to produce readable credit scorecards and offers attractive features including variable interaction and adaptive, stage-wise binning. We will contrast Real AdaBoost to the dominant methodology for creating credit scorecards: stepwise weight of evidence logistic regression (SWOELR). Real AdaBoost is remarkably similar to SWOELR and is well positioned to serve as a benchmark for SWOELR models; it may even offer a statistical framework by which we can understand the power of SWOELR. We offer a macro to generate Real AdaBoost models in SAS. INTRODUCTION Financial institutions (FIs) must develop a wide range of models for marketing, fraud detection, loan adjudication, etc. Modeling has undergone a recent renaissance as machine learning has exploded – spurned by the availability of advanced statistical techniques, the ubiquity of powerful computers to execute these techniques, and the well-publicized successes of the companies who have embraced these methods (Parloff 2016). Modeling departments within some FIs face opposing demands: executives want some of the famed value of advanced methods, while government regulators, internal deployment teams and front-line staff want models that are easy to implement, interpret and understand. In this paper we review Real AdaBoost, a machine learning technique that may offer a middle-ground between powerful, but opaque machine learning methods and transparent conventional methods. CONSUMER RISK MODELS One field of modeling where FIs must often strike a balance between power and transparency is consumer risk modeling. Consumer risk modeling involves ranking customers by their credit worthiness (the likelihood they will repay a loan): first by identifying customer characteristics that indicate risk of delinquency, and then combining them mathematically to calculate a relative risk score for each customer (common characteristics include: past loan delinquency, high credit utilization, etc.). CREDIT SCORECARDS In order to keep the consumer risk models as transparent as possible, many FIs require that the final output of the model be in the form of a scorecard (an example is shown in Table 1). Credit scorecards are a popular way to represent customer risk models due to their simplicity, readability, and the ease with which business expertise can be incorporated during the modeling process (Maldonado et al. 2013). A scorecard lists a number of characteristics that indicate risk and each characteristic is subdivided into a small number of bins defined by ranges of values for that characteristic (e.g., credit utilization: 30-80% is a bin for the credit utilization characteristic). Each bin is assigned a number of score points, a value derived from a statistical model and proportional to the risk of that bin (SAS 2012). A customer will fall into one and only one bin per characteristic and the final score of the applicant is the sum of the points assigned by each bin (plus an intercept). This final score is proportional to consumer risk. The procedure for developing scorecards is termed stepwise weight of evidence logistic regression (SWOELR) and is implemented in the Credit Scoring add-on in SAS® Enterprise MinerTM.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Matrix Sequential Hybrid Credit Scorecard Based on Logistic Regression and Clustering

The Basel II Accord pointed out benefits of credit risk management through internal models to estimate Probability of Default (PD). Banks use default predictions to estimate the loan applicants’ PD. However, in practice, PD is not useful and banks applied credit scorecards for their decision making process. Also the competitive pressures in lending industry forced banks to use profit scorecards...

متن کامل

A Necessary Condition for a Good Binning Algorithm in Credit Scoring

Binning is a categorization process to transform a continuous variable into a small set of groups or bins. Binning is widely used in credit scoring. In particular, it can be used to define the Weight of Evidence (WOE) transformation. In this paper, we first derive an explicit solution to a logistic regression model with one independent variable that has undergone a WOE transformation. We then u...

متن کامل

The Boosting Approach to Machine Learning An Overview

Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, this chapter overviews some of the recent work on boosting including analyses of AdaBoost’s training error and generalization error; boosting’s connection to game theory and linear programming; the relationship between boosting and logistic regression; extension...

متن کامل

Does segmentation always improve model performance in credit scoring?

Credit scoring allows for the credit risk assessment of bank customers. A single scoring model (scorecard) can be developed for the entire customer population, e.g. using logistic regression. However, it is often expected that segmentation, i.e. dividing the population into several groups and building separate scorecards for them, will improve the model performance. The most common statistical ...

متن کامل

Customer credit quality assessments using data mining methods for banking industries

Personal credit scoring on credit cards has been a critical issue in the banking industry. The bank with the most accurate estimation of its customer credit quality will be the most profitable. The study aims to compare quality prediction models from data mining methods, and improve traditional models by using boosting and genetic algorithms (GA). The predicting models used are instant-based cl...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017